Information processing apparatus and information processing method

ABSTRACT

A probability at which a target object takes a target object state is acquired for each of target object states that the target object is allowed to take, and a distribution of the probabilities is acquired. A success rate is acquired, for each relative target object state being determined in advance for a position and orientation of an image capturing device, at which the target object is successfully identified from a captured image obtained by capturing the target object having the relative target object state, and a distribution of the success rates is acquired. A position and orientation that the device is to take is determined based on the distribution of the success rates acquired for each of a plurality of positions and orientations that the image capturing apparatus is allowed to take, and the distribution of the probabilities.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a technique for identifying a targetobject at high accuracy.

2. Description of the Related Art

As one identification method, a study of controlling a computer to learnfeature amounts extracted from an image of a target object obtained froman image capturing device, and identifying a type of an object whichappears in the input image has been extensively made. Also, a study ofsimultaneously estimating a position and orientation of an objectsimultaneously with its type using model information or the like of theobject has been made. As an application destination of that technique,position/orientation identification (recognition) of parts, whichcontrols a robot to execute operations such as advanced assembling, isknown.

Non-patent literature 1 (B. Leibe, “Robust Object Detection withInterleaved Categorization and Segmentation”, IJCV Special Issue onLearning for Vision for learning, August 2007.) has proposed a method ofestimating a central position of an object by probabilistic voting byassociating features which are code-booked from learning images anddetected features with each other (implicit-shape-model). With thismethod, not only a type but also a position of an object can beestimated.

In patent literature 1 (Japanese Patent Laid-Open No. 2008-257649),feature points are extracted from an input image to calculate theirfeature amounts, and feature points similar to feature amounts inlearning images are set as corresponding points. Then, by voting toreference points based on feature amounts (including positioninformation) of feature points of learning images for respectivecorresponding points in an input image, a target object is identifiedand a position is estimated.

Also, a technique for speeding up processing and enhancing its accuracywhen information about states of target objects such as a pile of partsis acquired using a sensor such as a camera, positions and orientationsof respective target objects are estimated from the acquired informationsuch as an image, and target objects are sequentially picked up bygripping them by a robot has been studied.

Patent literature 2 (Japanese Patent No. 4238256) has proposed a methodof generating a virtual pile, and simulating robot operations based on avirtual captured image of that pile. A pile state is assumed by randomlygenerating orientations of a plurality of target objects using modeldata such as CAD data of target objects, and operations for handling atarget object by a robot are simulated.

In patent literature 3 (Japanese Patent No. 3300092), values which canbe assumed by parameters that define a position and orientation of atarget object, are stochastically predicted, a region (ROI) wherefeatures that define the target object exist on the screen is limited orthat where the features exist on a parameter space is limited accordingto the prediction result.

Patent literature 4 (Japanese Patent Laid-Open No. 2007-245283) shortensa processing time by selecting an orientation of a work from thoselimited to a plurality of stable orientations upon estimation of theorientation of the work. Patent literature 5 (Japanese Patent Laid-OpenNo. 2010-186219) shortens a processing time by calculating degrees ofstability for respective orientations of a work, and inhibiting use oftemplates which express orientations of low degrees of stability.

When positions and orientations of respective target objects such as apile of parts are to be estimated, since the positions and orientationsof the target objects have variations, feature amounts obtained fromimages obtained by viewing the target objects from various viewpointshave to be learned. However, it is often difficult to identify an imageof the target objects obtained at a certain viewpoint, and it isdifficult to raise identification accuracies of images of the targetobjects obtained at all the viewpoints.

The reason why identification accuracies of respective positions andrespective orientations are different depending on target objects isthat a feature portion helpful to identify a target object cannot alwaysbe obtained on an image when images of that target object are capturedfrom all viewpoints. For this reason, when the positions andorientations of target objects such as a pile of parts in a factory orthe like are to be estimated, it is required to determine the positionand orientation of a camera so as to improve the identificationaccuracies.

In patent literature 2, CG models of target objects are virtually piledup so as to simulate teaching of robot operations. However, patentliterature 2 does not include any description that improves theidentification accuracies.

In patent literature 3, the position and orientation of a target objectare stochastically predicted. However, patent literature 3 does notinclude any description that determines the position and orientation ofa camera so as to improve the identification accuracy.

In patent literature 4, the orientation of a target object to beestimated is limited to those around stable orientations. However,patent literature 4 does not include any description which determinesthe position and orientation of a camera so as to improve theidentification accuracy.

In patent literature 5, templates are generated using degrees ofstability of orientations, but the degrees of stability are not used toobtain an accurate estimation result of an orientation. That is, patentliterature 5 does not consider a reduction of estimation errors of anorientation using degrees of stability in estimation of the orientation.

SUMMARY OF THE INVENTION

The present invention has been made in consideration of theaforementioned problems, and provides a technique for determining theposition and orientation of an image capturing device so as to enhancethe identification accuracy of a target object in an image capturedusing the image capturing device.

According to the first aspect of the present invention, an informationprocessing apparatus, comprises: a first acquisition unit configured toacquire a probability at which a target object takes a target objectstate for each of target object states that the target object is allowedto take, and configured to acquire a distribution of the acquiredprobabilities; a second acquisition unit configured to acquire a successrate, for each relative target object state being determined in advancefor a position and an orientation of an image capturing device, at whichthe target object is successfully identified from a captured imageobtained by capturing, by the image capturing device, the target objecthaving the relative target object state, and configured to acquire adistribution of the acquired success rates; and a determination unitconfigured to determine a position and orientation that the imagecapturing device is to take based on the distribution of the successrates acquired by the second acquisition unit for each of a plurality ofpositions and orientations that the image capturing apparatus is allowedto take, and the distribution of the probabilities acquired by the firstacquisition unit.

According to the second aspect of the present invention, an informationprocessing method, comprises: a first acquisition step of acquiring aprobability at which a target object takes a target object state foreach of target object states that the target object is allowed to take,and of acquiring a distribution of the acquired probabilities; a secondacquisition step of acquiring a success rate, for each relative targetobject state being determined in advance for a position and anorientation of an image capturing device, at which the target object issuccessfully identified from a captured image obtained by capturing, bythe image capturing device, the target object having the relative targetobject state, and of acquiring a distribution of the acquired successrates; and a determination step of determining a position andorientation that the image capturing device is to take based on thedistribution of the success rates acquired in the second acquisitionstep for each of a plurality of positions and orientations that theimage capturing apparatus is allowed to take, and the distribution ofthe probabilities acquired in the first acquisition step.

Further features of the present invention will become apparent from thefollowing description of exemplary embodiments with reference to theattached drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing an example of the functionalarrangement of an information processing apparatus and its peripheraldevice;

FIG. 2 is a flowchart of processing to be executed by an informationprocessing apparatus 200;

FIG. 3 is a view showing the outer appearance of a system including theinformation processing apparatus 200;

FIG. 4 is a view for explaining representative orientations;

FIGS. 5A to 5E are views showing states of a target object 31 at fivedifferent representative orientations;

FIGS. 6A and 6B are views for explaining target objects on a tray 60;

FIG. 7 is a view for explaining a generation method of a pile of targetobjects by simulation;

FIG. 8 is a view for explaining the relationship between a worldcoordinate system and camera coordinate system;

FIG. 9 is a flowchart of processing at the time of learning;

FIG. 10 is a flowchart of processing in step S910;

FIG. 11 is a flowchart of identification processing;

FIG. 12 is a view showing an example of a voting space;

FIG. 13 is a block diagram showing an example of the functionalarrangement of an information processing apparatus and its peripheraldevice;

FIG. 14 is a flowchart of processing to be executed by the informationprocessing apparatus 200;

FIG. 15 is a view showing an example of a target object; and

FIG. 16 is a view showing an example of a target object.

DESCRIPTION OF THE EMBODIMENTS

Embodiments of the present invention will be described hereinafter withreference to the accompanying drawings. Note that an embodiment to bedescribed hereinafter is an example when the present invention ispractically carried out, and a practical embodiment of the arrangementsdescribed in the scope of claims.

First Embodiment

When positions and orientations of target objects such as a pile ofparts in a factory or the like are to be estimated, a situation thatmodels (three-dimensional virtual objects) of the target objects and thetarget objects are placed is often known in advance. In this case, bysimulating the situation that pieces of model information of the targetobjects and the target objects are placed, a prior distribution of thepositions and orientations of the target objects can be learned. Also,by defining the position and orientation of an image capturing deviceused to capture an image of the target objects, a virtual image of thetarget objects can be simulated, and the identification accuracies forrespective positions and orientations of the target objects can bepredicted.

An example of the functional arrangement of an information processingapparatus according to this embodiment and its peripheral device will bedescribed below with reference to the block diagram shown in FIG. 1.

An image capturing unit 100 captures still images and moving images, andis used to capture images of target objects in this embodiment. Theposition and orientation of the image capturing unit 100 are controlledby an information processing apparatus 200, and images captured by theimage capturing unit 100 are input to the information processingapparatus 200.

The information processing apparatus 200 includes an image capturingunit control unit 230, occurrence probability storage unit 210, targetobject state identification unit 240, and identification reliabilitystorage unit 220. Functions of the respective units will be describedbelow with reference to FIG. 2 which shows the flowchart of processingto be executed by the information processing apparatus 200. Note thatinformation is stored in an appropriate memory in the informationprocessing apparatus 200 unless otherwise defined.

In step S110, the image capturing unit control unit 230 acquires adistribution of probabilities at which when a target object state isdefined by a position and/or an orientation of a target object, thetarget object takes the target object state with respect to targetobject states that the target object can take on a world coordinatesystem (first acquisition).

For example, this distribution may be stored in advance in theoccurrence probability storage unit 210, and the image capturing unitcontrol unit 230 may read it out. Alternatively, the image capturingunit control unit 230 may calculate a new distribution. In thisembodiment, assume that the image capturing unit control unit 230calculates a new distribution.

In step S120, the image capturing unit control unit 230 operates asfollows. That is, the image capturing unit control unit 230 acquires adistribution of “success rates (identification success rates,identification reliabilities) at which a target object is successfullyidentified from a captured image obtained by capturing an image of thetarget object in a relative target object state, which is determined inadvance for the position and orientation of the image capturing unit100, by the image capturing unit 100” with respect to that target objectstate (second acquisition).

For example, this distribution may be stored in advance in theidentification reliability storage unit 220, and the image capturingunit control unit 230 may read it out. Alternatively, the imagecapturing unit control unit 230 may calculate a new distribution.Alternatively, the distribution may be calculated by reflectingidentification results in an online state. In this embodiment, assumethat the image capturing unit control unit 230 calculates a newdistribution.

In step S130, the image capturing unit control unit 230 determines oneof a plurality of positions and orientations that the image capturingunit 100 can take as a desired position and orientation (to enhance theidentification accuracy) using the distribution acquired in step S110and that acquired in step S120. Then, the image capturing unit controlunit 230 updates the position and orientation of the image capturingunit 100 to the determined position and orientation.

Note that in step S130, a relative position and orientation between acoordinate space (world coordinate system) on which probabilities ofoccurrence of a target object are defined and that (camera coordinatesystem) on which identification success rates are defined can bechanged, so as to enhance the identification accuracy. For this purpose,when the position of the image capturing unit 100 is determined inadvance, the position (layout condition) of the target object may bechanged. For example, when all of target objects are placed on a tray orthe like, the position and height of the tray may be changed to changethe relative position to the image capturing unit 100.

In step S140, the target object state identification unit 240 acquiresan image captured by the image capturing unit 100, the position andorientation of which have been changed, from the image capturing unit100, and identifies target object states of target objects which appearin the acquired image. Note that an object to be identified is notlimited to a specific target object.

A practical arrangement example of a system including the informationprocessing apparatus according to this embodiment will be describedbelow with reference to the outer appearance view shown in FIG. 3. InFIG. 3, the same reference numerals denote the same components shown inFIG. 1, and a description thereof will not be repeated.

The image capturing unit 100 captures an image of a pile of targetobjects 30. A robot 20 is used to pick up a target object 30 from thepile of target objects 30. For example, by sending the position andorientation identified by the information processing apparatus 200 tothis robot 20, this robot 20 controls an arm according to the positionand orientation received from the information processing apparatus 200,and picks up this target object.

Details of the processes in respective steps shown in FIG. 2 will bedescribed below. Details of the process in step S110 will be describedfirst. In this case, an orientation is defined by 360°, and thisembodiment handles representative orientations as those which arediscretely sampled. In case of a joint object or deformed object, ajoint angle or deformation angle can be handled as a variable whichdetermines a target object state, but such variable will be explainedlater in the subsequent embodiments.

In case of a pile of target objects, the orientation distribution thattarget objects can take suffers a bias depending on positions andheights with respect to the pile. Hence, the position and orientation ofthe image capturing unit 100 are controlled to enhance theidentification accuracy in consideration of that bias.

Representative orientations will be described below with reference toFIG. 4. In this embodiment, representative orientations are calculatedusing a geodesic dome. The geodesic dome uniformly and discretelyexpresses a spherical surface by recursively dividing a triangularsurface element of a regular polyhedron into triangles of identicalareas, and is a known method. If the center of the geodesic dome isconsidered as a target object 31, vertices 50 of the regular polyhedronobtained by the geodesic dome can be respectively considered asviewpoints which look down the target object 31 from various positions,and vertices and plane central points of each regular icosahedron can beused as the viewpoints.

In this embodiment, the number of vertices of the regular icosahedron is16, the number of planes is 20, and orientations when the target object31 is viewed from a total of 36 viewpoints are defined as representativeorientations. Furthermore, for each representative orientation, anin-plane rotation when viewed from that direction has to be considered.For example, when the in-plane rotation is discriminated atgranularities in increments of an angle=18°, 20 different in-planerotation orientations exist. Therefore, in this case, there are 720(=36×20) different representative orientations. FIGS. 5A to 5Erespectively show states of the target object 31 at five differentrepresentative orientations of those defined in FIG. 4.

Each piled target object 31 is likely to direct in every directions withrespect to the image capturing unit 100, and is likely toin-plane-rotated in these directions. For this reason, 720 differentorientations are detected. Therefore, in this embodiment, theprobability of occurrence corresponding to 720 different orientations isacquired. In this case, the probability of occurrence corresponding to36 orientations may be acquired regardless of any in-plane rotation. Inthis case, detection is made by in-plane-rotating an image at the timeof identification.

An example of a method of calculating the probability of occurrence foreach target object state will be described below. Three-dimensionalvirtual object data (CAD data or the like) of a target object is storedin advance in a memory such as the occurrence probability storage unit210, and the image capturing unit control unit 230 forms athree-dimensional virtual object using this data. Then, the imagecapturing unit control unit 230 randomly generates rotation angles forthree axes of that model coordinate system, and selects a representativeorientation closest to the generated rotation angles of the plurality ofrepresentative orientations. Furthermore, a count value for the selectedrepresentative orientation of the plurality of representativeorientations is counted up. Thus, a selection count can be calculatedfor each of the plurality of representative orientations, and thisselection count is divided by a generation count of rotation angles,thus calculating the probability of occurrence. Note that in a situationin which a target object is conveyed by a belt conveyor, since thetarget object has a stable orientation with respect to a floor or thelike, rotation angles are generated to achieve the stable orientation.In this manner, depending on the situations, in place of quite randomlygenerating rotation angles, rotation angles are generated randomlywithin a certain range or based on given rules. Also, all of positionsmay be “0”, or positions may be quite randomly generated, or may begenerated randomly within a certain range or based on given rules.

Also, since this embodiment considers a situation in which targetobjects are piled, the method of calculating the probability ofoccurrence when target objects are piled on a work area (tray) 60, asshown in FIGS. 6A and 6B, will be described below. At this time, a Zdirection of the world coordinate system is set to agree with a heightdirection of a piled target object set 61.

FIG. 6A shows a state when the target object set 61 is viewed from theabove (Z direction), and FIG. 6B shows a state when the target objectset 61 is viewed sideways (Y direction). The piled target object set 61is generated by simulation using three-dimensional virtual objects oftarget objects, and representative orientations corresponding toorientations of respective target objects are stored, thereby storingthe probability of occurrence of orientations of the piled targetobjects. A practical method is disclosed in patent literature 2described in the description of the related art.

In a generation method of piled target objects by simulation, as shownin FIG. 7, rotation angles with respect to three axes of the modelcoordinate system are randomly generated to define an orientation, and atarget object is moved downward from a random position within the workarea (tray), thereby virtually piling target objects. More specifically,X and Y values on the world coordinate system are randomly determinedfrom a domain which defines the work area, and a sufficiently largevalue is set as a Z value. A predetermined number of target objects 31,which are defined in advance, are moved down to generate virtually piledtarget objects 30. In this case, target objects are moved down in turnso as to stabilize respective target objects. A plurality of virtualpiles are generated, and an orientation distribution P_(P)(X, Y, Z, θx,θy, θz) on the world coordinate system is stored in the occurrenceprobability storage unit 210. This distribution P_(P)(X, Y, Z, θx, θy,θz) is a distribution of probabilities at which a target object takes anorientation (θx, θy, θz) at an arbitrary position (X, Y, Z). θx, θy, andθz are respectively rotation angles for the X-axis, Y-axis, and Z-axison the world coordinate system. Definition of an orientation may use asingle rotation expression expressed by a rotation axis and a rotationangle associated with that rotation axis or other expressions.

In this way, the method of acquiring the probability of occurrence withrespect to respective target object states, the target object stateexpression method, and the like are not limited to specific methods(expression methods), and various methods can be used. That is,arbitrary methods can be used as long as a distribution of probabilitiesat which a target object takes a target object state with respect totarget object states that the target object can take.

Note that when such distribution is stored in advance in the occurrenceprobability storage unit 210, the aforementioned processing is executedin advance, and the distribution obtained by this processing is storedin the occurrence probability storage unit 210.

Details of the process in step S120 will be described below. The processin step S120 is roughly classified into two steps, that is, an imagegeneration step and reliability calculation step. The image generationstep will be described first.

In this embodiment, the distribution of success rates with respect tothe relative target object state, which is determined in advance for theposition and orientation of the image capturing unit 100, is calculated.That is, the distribution of probabilities is calculated from theprobabilities of respective target object states on the world coordinatesystem, while the distribution of success rates is calculated fromsuccess rates with respect to respective target object states on thecamera coordinate system with reference to the position and orientationof the image capturing unit 100. Alternatively, these distributions maybe defined on the image coordinate system.

The relationship between the world coordinate system and cameracoordinate system will be described below with reference to FIG. 8. Theworld coordinate system has one point on a real space as an origin, anddefines three axes which are orthogonal to each other at this originrespectively as X-, Y-, and Z-axes. A world coordinate system 810 shownin FIG. 8 has a base of an arm of the robot 20 as an origin. On theother hand, the camera coordinate system has one point on the imagecapturing unit 100 as an origin, and defines three axes, which areorthogonal to each other at this origin respectively as X-, Y-, andZ-axes. A camera coordinate system 800 shown in FIG. 8 has an opticalcenter of the image capturing unit 100 as an origin.

Note that the following description will be given under the assumptionthat the position and orientation of the tray as a target object on theworld coordinate system, and the position and orientation of the cameracoordinate system on the world coordinate system are given (have beencalibrated).

Let P_(R)(XC, YC, ZC, θcx, θcy, θcz) be a distribution of success ratesdefined on the camera coordinate system 800, and P_(R)(x, y, θcx, θcy,θcz) be a distribution of success rates defined on the image coordinatesystem. An orientation indicates rotation angles with respect to the X-,Y-, and Z-axes of the camera coordinate system, as described above.

A layout range of target objects on the camera coordinate system orimage coordinate system are determined by its layout environment. Forexample, in case of a task for picking up a piled target part using arobot in a cell or the like, a position where the image capturing unit100 or that where target parts are placed is limited, thus determiningthe layout range of target objects.

In the image generation step, images of 36 orientations defined by thedetermined positions on the camera coordinate system and the abovegeodesic dome-like viewpoints or 720 orientations in consideration ofin-plane rotations are generated. For example, when a position range onthe camera coordinate system is limited to a range of 50 cm×50 cm×50 cm,and identification success rates are discretely defined at 5-cmintervals, XC=0, 5, . . . , 50, YC=0, 5, . . . , 50, and ZC=0, 5, . . ., 50 are defined. Therefore, identification success rates can becalculated at 720000 points (=10×10×10° 720), and 720000 images aregenerated. When the number of images is large, images can be randomlysampled within the position range and orientation range. Alternatively,a target object may be actually placed on the work area (tray) andseveral images of it may be captured by the image capturing unit 100. Inthis case, a position and orientation of the captured target object onthe camera coordinate system have to be separately input. A capturedimage of the target object is registered as that of a representativeorientation closest to the input orientation. A registration destinationis not limited to a specific registration destination, and may be anappropriate memory in the information processing apparatus 200.

The reliability calculation step will be described below. In thereliability calculation step, identification success rates for imagesregistered in the above image generation step are calculated. Theidentification success rates are calculated using an identifier whichhas been learned in advance. When an identifier is that which calculatesscores associated with respective states, scores of corresponding statesare evaluated. As the identification success rates, similarities betweenregistered images can be calculated, and lower identification successrates can be set for images with larger similarities without executingactual identification processing.

Various identifiers are available. This embodiment will explain a methodof estimating a final identification result by integration processing ofvoting of identification results of weak identifiers. Various weakidentifiers are available. This embodiment will exemplify a featurepoint-based method and that using a classification tree.

A learning method for identification processing by feature point-basedvoting will be described first. An image of a target object (learningimage) is captured in advance, and features are extracted from thislearning image. As examples of features to be extracted, a feature pointand a feature amount which describes a feature around the feature pointare extracted. For example, a feature amount which describes informationof a luminance gradient around a feature point like SURF (H. Bay,“Speeded-Up Robust Features (SURF)”, Computing Vision and ImageUnderstanding, Vol. 110 (3) June 2008, pp. 346-359.) may be used. Inaddition, feature points such as so-called Keypoints (E. Tola, “A FastLocal Descriptor for Dense Matching”, CVPR 2008., K. Mikolajczyk, “APerformance Evaluation of Local Descriptors”, PAMI, 27(10) 2004, pp.1615-1630.) may be used. Also, an image patch, edgelet, and the like maybe used.

In this case, let x and y be image coordinates, fi=(xi, yi) (i=1, 2, . .. , N) be each feature point, and Fi (i=1, 2, . . . , N) be a featureamount vector which expresses a feature amount. N is the total number offeature points obtained from the learning image, and i is an index foreach feature point.

As the learning image, images captured from respective viewpoints(viewpoints 50) of the geodesic dome which surrounds the target object31, as shown in FIG. 4 above, are used. Examples of the learning imagesare as shown in FIGS. 5A to 5E above. Feature points are acquired fromthe acquired learning images, and feature amounts which describe amountsaround the feature points are acquired.

Next, learning of the identifier is made. In this case, a learningmethod and identification method when vectors to reference points of atarget object corresponding to feature points of a learning image arestored, and a class and position of the target object are detected byvoting to the reference points set on the target object in associationwith the feature amounts as in patent literature 1 will be described. Inthis case, a voting space is not particularly limited. For example, aspace defined by the x- and y-axes of the image coordinate system and IDaxis (class index representing a registered target object), a spacedefined by the x- and y-axes of the image coordinate system and a scales axis, a space defined by camera coordinate axes XC, YC, and ZC, andthe like may be used.

In place of voting to the reference points, identification can also beattained by a method of making probabilistic voting from respectivelocal features to the target object center like the implicit-shape-model(non-patent literature 1) described in the description of the relatedart.

In case of a multi-class problem, after voting is respectively made toall classes, a class and position corresponding to the largest number ofvotes may be output as identification results or all detection pointscorresponding to numbers of votes equal to or larger than a pre-setthreshold may be output as identification results.

This embodiment will exemplify a case in which feature points areextracted from an image, and voting is made to reference points set on atarget object, thereby estimating a type and position of the targetobject. The practical processing at the time of learning will bedescribed below with reference to FIG. 9 which shows the flowchart ofthat processing.

In step S900, feature amounts Fi (i=1, 2, . . . , N) of respectivefeature points fi of a learning image, and a class (an orientation ortype of a target object; indicating one representative orientation inthis embodiment) of that target object are saved. In this case, let IDi(i=1, 2, . . . , N) be an index indicating a class of a target object.IDi assumes values ranging from 1 to P (P is the total number ofclasses).

In step S910, vectors Mij (i=1, 2, . . . , N, j=1, 2, . . . ) fromrespective feature points fi of the learning image to reference pointsOj (j=1, 2 . . . ) on a target object are calculated. The process instep S910 will be described below with reference to FIG. 10.

Initially, a vector 33 Mn=(xo−xn, yo−yn) from a feature point 32 fn=(xn,yn) of the target object 31 to a reference point 34 (an object center inthis case) O=(xo, yo) set on the target object is calculated.

After the processes of steps S900 and S910 are executed for all learningimages, all obtained feature points fi (i=1, 2, . . . , Nall) areclustered according to feature amounts Fi (i=1, 2, . . . , Nall) in stepS920.

In this case, Nall indicates the number of feature points obtained fromall the learning images. As the clustering method, arbitrary clusteringmethods such as k-means, a self-organizing map algorithm, and the likecan be used. For example, when k-means is used, the feature points canbe clustered by defining the number K of clusters and using Euclideandistances between feature amounts Fi.

Finally, in step S930, representative vectors Fk′ (k=1, 2, . . . , K) ofrespective clusters (K is the number of clusters, and k is an index of acluster) and feature points included the clusters are saved, and areused in association with feature amounts obtained in the identificationprocessing.

Learning of the identifier may use a random forest method (Tin Kam Ho,U.S. Pat. No. 6,009,199) as one of ensemble learning methods. The randomforest method executes pattern identification using a plurality ofdecision trees. In the random forest method, respective nodes randomlyissue information inquiries, and learning patterns are divided in turnin accordance with inquiry results, thus branching a decision tree. Alearning pattern which is left when a leaf node is reached is stored asa classification result of that leaf node. In this embodiment,classification is made to have respective feature points of respectivelearning images as learning patterns. As in the above learning, vectorsMij (i=1, 2, . . . , N, j=1, 2, . . . ) from respective feature pointsfi to reference points Oj (j=1, 2 . . . ) on a target object, theirfeature amounts Fi (i=1, 2, . . . , N), and a class (an orientation ortype of a target object) of that target object are saved.

At the time of discrimination, the same inquiries as those at the timeof learning are issued to trace from a root node to a leaf node. When aleaf node is reached, the stored pattern is output as a discriminationresult of that decision tree. Discrimination results of all decisiontrees are integrated by voting or the like to output a finaldiscrimination result.

In the identification processing, using the learned identifier, aposition and a class corresponding to an orientation of a target objectare output. The practical processing will be described below withreference to FIG. 11 which shows the flowchart of that processing.

In step S1101, feature extraction from images is executed as in thelearning processing. In step S1102, distances between feature amounts ofextracted feature points and representative vectors of respectiveclusters are calculated to determine a cluster having the highestsimilarity. When the random forest method is used, feature pointsextracted from images are classified to determine a leaf node. The sameapplies to a case in which other classification trees and identifiersare used.

Next, in step S1103, voting is made based on the class of the targetobject and vectors to reference points associated with respectivefeature points in a cluster to which the representative vectorcalculated by association belongs. A voting space in this embodiment isdefined by three axes, that is, the x- and y-axes of the imagecoordinate system and an axis which represents a class ID. FIG. 12 showsan example of the voting space defined by the three axes, that is, thex- and y-axes of the image coordinate system and the axis whichrepresents a class ID. FIG. 12 shows a voting space 70 defined by thethree axes, that is, the x- and y-axes of the image coordinate systemand the axis which represents a class ID. Assume that a size of eachcell is set in advance.

In a practical voting method, a cluster k″ having the highest similarityis determined by associating feature amounts Gm of respective featurepoints gm=(xm, ym) (m=1, . . . , M) extracted from captured images andrepresentative vectors Fk′ (k=1, . . . , K) of respective clusters,which have been learned in advance). In this case, M is the total numberof feature points extracted from images, and m is its index.

For example, the cluster k″ is calculated using Euclidean distancesbetween feature amounts Gm of feature points gm and representativevectors Fk′ (k=1, . . . , K) of respective clusters according to:

$\begin{matrix}{k^{''} = {\underset{k}{\arg \; \min}{{G_{m} - F_{k}^{\prime}}}}} & (1)\end{matrix}$

According to the determined cluster, voting processing is executed inaccordance with vectors Mij (i=1, 2, . . . , N, j=1, 2, . . . ) toreference points associated with feature points fi included in thatcluster and class IDi (i=1, 2, . . . , N). More specifically, letting(x, y, ID) be a vote point on the voting space, we have:

(x,y)=(x _(m) ,y _(m))+M _(ij)

ID=ID _(i)  (2)

In practice, a cell corresponding to calculated (x, y, ID) is voted.This processing is applied to all feature points in the determinedcluster. In this case, the total number of votes is N×M at maximum.

Next, in step S1104, a cell having the maximum number of votes on thevoting space voted in step S1103 is extracted. Alternatively, cellshaving the number of votes which are equal to or larger than apredetermined threshold or cells as many as the predetermined number(the number of candidates) in descending order of number of votes may beextracted. In this case, let P (P≧1) be the number of candidates, and(xP, yP, IDP) (p=1, 2, . . . , P) be each candidate.

With the aforementioned method, since a position of a target object onthe image coordinate system can be calculated, if calibration associatedwith the camera coordinate system and image coordinate system isexecuted in advance, a position on the camera coordinate system can becalculated from the calculated position on the image coordinate system.In this case, each candidate is converted into (XCP, YCP, ZCP, IDP)(p=1, 2, . . . , P) using the calibration result.

Identification results as many as the predetermined number of candidatesare calculated for all registered images. Next, identification successrates corresponding to positions and orientations on the correspondingcamera coordinate system are calculated for respective images. Thus, thedistribution P_(R)(XC, YC, ZC, θx, θy, θz) of identification successrates is calculated. This distribution P_(R)(XC, YC, ZC, θx, θy, θz) isthat of identification success rates of target objects which takeorientations (θx, θy, θz) at arbitrary positions (XC, YC, ZC) on thecamera coordinate system. In practice, identification success rates atpositions of (θx, θy, θz) corresponding to respective orientation IDsare calculated.

The identification success rates are calculated from identificationresults for respective images and a correct solution which is stored inadvance. In the above identification method, the number of votes can beused as an identification score. For this reason, an identificationsuccess rate is calculated from an identification score SCORE₁ of anidentification candidate P=1 obtained when identification is executedfor a corresponding registered image and an identification scoreSCORE_(TRUE) of an identification candidate PTRUE which yields acorrection solution, which is set in advance, using:

$\begin{matrix}{{P_{R}\left( {X_{C},Y_{C},Z_{C},{\theta \; x},{\theta \; y},{\theta \; z}} \right)} = \frac{{SCORE}_{TRUE}}{{SCORE}_{1}}} & (3)\end{matrix}$

When an identification result of the candidate P=1 with respect to theregistered image is TRUE, an identification success rate P_(R) assumes amaximum value “1”. When all candidates do not include identificationresults=TRUE, SCORE_(TRUE) assumes “0”, and the identification successrate P_(R) also assumes “0”. Also, a method of setting an identificationscore as an identification success rate when the identification resultof the candidate P=1 is TRUE, and a method of setting an identificationsuccess rate to be “1” when the identification result of the candidateP=1 is TRUE, and “0” when it is FALSE can be used. The identificationsuccess rates are calculated for all the registered images, andP_(R)(XC, YC, ZC, θcx, θcy, θcz) is used in the next process.

In step S130, the image capturing unit control unit 230 calculates aninner product of the success rate distribution and probabilitydistribution for a predetermined relative target object state withrespect to each of a plurality of positions and orientations that theimage capturing unit 100 can take. Then, the image capturing unitcontrol unit 230 determines a position and orientation corresponding tothe maximum inner product value.

Letting R be a rotation matrix including orientation components of theimage capturing unit 100 on the world coordinate system, and T be atranslation matrix including position components of the image capturingunit 100 on the world coordinate system, equation (4) below iscalculated for a plurality of different matrices T and R:

$\begin{matrix}{\left( {\hat{R},\hat{T}} \right) = {\underset{R,T}{\arg \; \max}\left( {{P_{P}\left( {X,Y,Z,\theta_{X},\theta_{Y},\theta_{Z}} \right)} \cdot {f\left( {{P_{R}\left( {X_{C},Y_{C},Z_{C},\theta_{CX},\theta_{CY},\theta_{CZ}} \right)},R,T} \right)}} \right)}} & (4)\end{matrix}$

where f(P_(R)(XC, YC, ZC, θcx, θcy, θcz), R, T) is a function whichconverts the success rate distribution defined on the camera coordinatesystem into that on the world coordinate system using the matrices R andT.

Note that we have φ={R, T} and X={X, Y, Z, θx, θy, θz}. f(P_(R)(XC, YC,ZC, θcx, θcy, θcz), R, T) can be rewritten as P_(R)(X, Y, Z, θx, θy,θz|R, T). Therefore, equation (4) can be rewritten as:

$\begin{matrix}{\left( \hat{\varphi} \right) = {\underset{\varphi}{\arg \; \max}\left( {{P_{P}(X)} \cdot {P_{R}\left( X \middle| \phi \right)}} \right)}} & (5)\end{matrix}$

When L(φ)=P_(P)(x)·P_(R)(X|φ) in equation (5), log L(φ) need only bemaximized. This can be solved by repeating an update expression givenby:

$\begin{matrix}\left. \left. \hat{\varphi}\leftarrow{\hat{\varphi} + {ɛ\frac{{\partial\log}\; L}{\partial\varphi}}} \right. \right|_{\varphi = \hat{\varphi}} & (6)\end{matrix}$

where ε is a small, positive scaler value. With this processing, R and Tneed only be calculated to increase the inner product of the successrate distribution and probability distribution on the world coordinatesystem. In this inner product processing, as given by the aboveexpression, a sum total result of products of values (success rates andprobability values) each for the same target object state of theconverted success rate distribution and probability distribution for alltarget object states (in practice, an overlapping portion of therespective distributions) is calculated.

Note that the position and orientation of the image capturing unit 100corresponding to the maximum inner product of the occurrence probabilitydistribution and the identification success rate distribution arecalculated in this embodiment. Alternatively, the update expressiongiven by expression (6) may be updated at least once to determine theposition and orientation of the image capturing unit 100. Also, initialvalues of the position and orientation of the image capturing unit 100may be given randomly or by the user.

In step S140, the target object state identification unit 240 updatesthe position and orientation of the image capturing unit 100 to thosedetermined in step S130, and then acquires a captured image captured bythis image capturing unit 100. The target object state identificationunit 240 then identifies (estimates) target object states of targetobjects which appear in the acquired captured image. In thisidentification, the identifier used in step S120 above may be usedagain, and another identifier may be separately prepared.

As described above, according to this embodiment, since the position andorientation of the image capturing device are determined based onprobabilities of occurrence and identification success rates forrespective target object states, the identification accuracies of targetobjects from the captured image by this image capturing device can beimproved.

Second Embodiment

An example of the functional arrangement of an information processingapparatus according to this embodiment will be described below withreference to the block diagram shown in FIG. 13. In the arrangementshown in FIG. 13, the same reference numerals denote the same componentsas those shown in FIG. 1, and a description thereof will not berepeated. In the arrangement shown in FIG. 13, an image capturing unitchanging unit 250 is added to the arrangement shown in FIG. 1.

Processing to be executed by the information processing apparatusaccording to this embodiment will be described below with reference toFIG. 14 which shows the flowchart of that processing. In the flowchartshown in FIG. 14, steps S250 to S270 are added to the flowchart shown inFIG. 2.

The target object state identification unit 240 checks in step S250whether or not target object state identification processing is to becontinued. If it is determined that the identification processing is tobe continued, the process advances to step S260. The image capturingunit control unit 230 checks in step S260 whether or not the positionand orientation of the image capturing unit 100 are required to bechanged. As a result of this checking process, if the position andorientation are required to be changed, the process advances to stepS270. If the position and orientation need not be changed, the processreturns to step S240. In step S270, the image capturing unit controlunit 230 changes the position and orientation of the image capturingunit 100.

This embodiment assumes a situation in which states of piled targetobjects are changed. For example, this embodiment assumes a situation inwhich the robot 20 or the like picks up target objects in turn.

In step S210, the image capturing unit control unit 230 executesprocessing for storing variables indicating the number of target objectsand a height of piled target objects in an appropriate memory in theinformation processing apparatus in addition to the processing in stepS110. Thus, a distribution which changes according to the number andheight of target objects can be stored. More specifically, occurrenceprobability distributions for the respective numbers and heights oftarget objects are learned to change an occurrence probabilitydistribution to be loaded at the time of identification. When the numberand height of target objects are decreased at the time ofidentification, a distribution to be loaded is changed, and the positionand orientation of the image capturing unit 100 are changed to improvethe identification accuracy based on that distribution.

In steps S220 to S240, the same processes as those in steps S120 to S140above are executed. Note that in step S240, in addition to theaforementioned processing, information indicating a target object whosetarget object state is identified is stored in an appropriate memory inthe information processing apparatus, or this target object is picked upby the robot 20 or the like to be excluded from identification targetsof the subsequent identification processing.

The target object state identification unit 240 checks in step S250whether or not the identification processing is to be continued.Criteria as to whether or not the identification processing is to becontinued are not limited to a specific criterion. For example, it maybe determined that the identification processing is not to be continuedif the current number of target objects becomes equal to or smaller thana prescribed value. The current number of target objects can be obtainedby recognizing target objects which appear in a captured image by theimage capturing unit 100 and counting the number of recognized targetobjects.

The image capturing unit changing unit 250 checks in step S260 whetheror not the position and orientation of the image capturing unit 100 areto be changed. Initially, in the aforementioned occurrence probabilitydistribution, the probability of occurrence of the position andorientation corresponding to a target object whose target object stateis identified in step S240 is deleted (set to be “0”), or the occurrenceprobability distribution corresponding to the number and height oftarget objects is updated. The number of target objects is input at thetime of design. The height of the target object is measured by adistance measurement sensor such as a TOF. Next, the inner product ofthe updated occurrence probability distribution and the identificationsuccess rate distribution is calculated in the same manner as in thefirst embodiment, and when the inner product result is smaller than acertain predetermined value, it is determined that the position andorientation of the image capturing unit 100 are to be changed.Alternatively, when the number of target objects is smaller than acertain predetermined value, it is determined that the position andorientation of the image capturing unit 100 are to be changed.

If it is determined that the position and orientation of the imagecapturing unit 100 are to be changed, the process advances to step S270;otherwise, the process returns to step S240.

In step S270, the image capturing unit control unit 230 changes theposition and orientation of the image capturing unit 100 using the aboveequation (6). Alternatively, the position and orientation of a movingdestination of the image capturing unit 100 may be simulated in advancebased on the occurrence probability distribution corresponding to thenumber of target objects.

As described above, according to this embodiment, in addition to theeffects of the first embodiment, the identification accuracies of targetobjects from the captured image by the image capturing unit 100 can beimproved by changing at least the position or orientation of the imagecapturing unit 100 during identification.

Third Embodiment

In this embodiment, target object states of target objects shown inFIGS. 15 and 16 are identified. In this case, the information processingapparatus 200 also stores deformation degrees of target objects inaddition to the first and second embodiments. Note that the informationprocessing apparatus described in the first and second embodiment isapplicable to this embodiment.

A target object (joint object) shown in FIG. 15 is formed of a referenceobject 300 and another object 301. In FIG. 15, the object 301 is tiltedthrough an appropriate angle 302 with respect to the reference object300. In this case, the deformation degree corresponds to this angle 302.The position and orientation of the joint object shown in FIG. 15 arethose of the reference object 300. Therefore, identification of a targetobject state of the joint object shown in FIG. 15 is that of theposition and orientation and deformation degree of the reference object300.

A target object (deformed object) 303 shown in FIG. 16 is deformed by acurvature 304 with respect to an axis 305. Therefore, identification ofa target object state of the deformed object 303 shown in FIG. 16 isthat of the position of the deformed object 303, a direction of the axis305, and the curvature 304.

In case of this embodiment, the processing according to the flowchartshown in FIG. 2 is basically executed. However, in step S110, inaddition to the first embodiment, the angle 302 in case of the jointobject shown in FIG. 15 and the direction of the axis 305 and thecurvature 304 in case of the deformed object shown in FIG. 16 are storedas variables in addition to the position/orientation on the worldcoordinate system. Also, as in the second embodiment, the number oftarget objects may be stored.

As for the subsequent processes, the same processes as in the first andsecond embodiments are executed. In this embodiment, in addition tothese processes, a joint object and deformed object are identified.Hence, only an identification method of these objects is different.

In case of the joint object shown in FIG. 15, the position andorientation of the reference object 300 are estimated. The estimationmethod can be the same as the identification method of target objects inthe first embodiment. After the position and orientation of thereference object 300 are estimated, a possible existence area of theobject 301 with respect to that reference object 300 is calculated.Next, the position and orientation of the object 301 are identified fromthe possible existence area of the object 301. The identification methodis the same as that of target objects in the first embodiment. In thismanner, the angle 302 between the reference object 300 and object 301can be calculated.

In case of the deformed object shown in FIG. 16, the direction of theaxis 305 and the curvature 304 with respect to that axis 305 can beestimated by identifying a position of an end position of the deformedobject 303 from an image, and searching for edges and the like from theend portion toward the other end portion of the deformed object 303.

As described above, according to this embodiment, since the position andorientation of the image capturing unit 100 are determined based on theprobabilities of occurrence and identification success rates forrespective states including deformation degrees and curvatures of targetobjects, the identification accuracies of target objects from a capturedimage by this image capturing device can be improved. Also, sinceexistence positions or areas of target objects are determined based onthe probabilities of occurrence and identification success rates, theidentification accuracies of target objects from a captured image bythis image capturing device can be improved.

Fourth Embodiment

The respective units included in the information processing apparatus200 shown in FIGS. 1 and 13 may be implemented by hardware.Alternatively, the occurrence probability storage unit 210 andidentification reliability storage unit 220 may be implemented by amemory such as a RAM or hard disk, and the remaining unit may beimplemented by computer programs. In this case, these computer programsare stored in this memory, and are executed by a processor such as aCPU.

Therefore, a computer having at least this memory and processor isapplicable to the information processing apparatus 200. Theaforementioned image capturing unit 100 may be connected to thiscomputer and may input captured images to this computer, or capturedimages may be stored in advance in the hard disk.

Other Embodiments

Aspects of the present invention can also be realized by a computer of asystem or apparatus (or devices such as a CPU or MPU) that reads out andexecutes a program recorded on a memory device to perform the functionsof the above-described embodiment(s), and by a method, the steps ofwhich are performed by a computer of a system or apparatus by, forexample, reading out and executing a program recorded on a memory deviceto perform the functions of the above-described embodiment(s). For thispurpose, the program is provided to the computer for example via anetwork or from a recording medium of various types serving as thememory device (for example, computer-readable medium).

While the present invention has been described with reference toexemplary embodiments, it is to be understood that the invention is notlimited to the disclosed exemplary embodiments. The scope of thefollowing claims is to be accorded the broadest interpretation so as toencompass all such modifications and equivalent structures andfunctions.

This application claims the benefit of Japanese Patent Application No.2011-264125 filed Dec. 1, 2011, which is hereby incorporated byreference herein in its entirety.

What is claimed is:
 1. An information processing apparatus, comprising:a first acquisition unit configured to acquire a probability at which atarget object takes a target object state for each of target objectstates that the target object is allowed to take, and configured toacquire a distribution of the acquired probabilities; a secondacquisition unit configured to acquire a success rate, for each relativetarget object state being determined in advance for a position and anorientation of an image capturing device, at which the target object issuccessfully identified from a captured image obtained by capturing, bythe image capturing device, the target object having the relative targetobject state, and configured to acquire a distribution of the acquiredsuccess rates; and a determination unit configured to determine aposition and orientation that the image capturing device is to takebased on the distribution of the success rates acquired by said secondacquisition unit for each of a plurality of positions and orientationsthat the image capturing apparatus is allowed to take, and thedistribution of the probabilities acquired by said first acquisitionunit.
 2. The apparatus according to claim 1, further comprising: achanging unit configured to change a position and an orientation of theimage capturing device to the position and the orientation determined bysaid determination unit; and a unit configured to identify the targetobject state of the target object which appears in an image captured bythe image capturing device, the position and orientation of which arechanged by said changing unit.
 3. The apparatus according to claim 1,further comprising: a unit configured to update a probability valuecorresponding to the target object state to be 0 in the distribution ofthe probabilities after the target object state of the target object isidentified.
 4. The apparatus according to claim 1, wherein said firstacquisition unit acquires the distribution of the probabilities, whichis generated in advance, and said second acquisition unit acquires thedistribution of the success rates, which is generated in advance.
 5. Theapparatus according to claim 1, wherein said determination unitcalculates an inner product of the distribution of the success ratesacquired by said second acquisition unit and the distribution of theprobabilities acquired by said first acquisition unit for each of theplurality of positions and orientations that the image capturing deviceis allowed to take, and determines a position and an orientationcorresponding to a maximum value of the calculated inner products as aposition and an orientation that the image capturing device is to take.6. The apparatus according to claim 1, wherein the target object stateis defined by a position and/or an orientation of the target object. 7.An information processing method, comprising: a first acquisition stepof acquiring a probability at which a target object takes a targetobject state for each of target object states that the target object isallowed to take, and of acquiring a distribution of the acquiredprobabilities; a second acquisition step of acquiring a success rate,for each relative target object state being determined in advance for aposition and an orientation of an image capturing device, at which thetarget object is successfully identified from a captured image obtainedby capturing, by the image capturing device, the target object havingthe relative target object state, and of acquiring a distribution of theacquired success rates; and a determination step of determining aposition and orientation that the image capturing device is to takebased on the distribution of the success rates acquired in the secondacquisition step for each of a plurality of positions and orientationsthat the image capturing apparatus is allowed to take, and thedistribution of the probabilities acquired in the first acquisitionstep.
 8. A non-transitory computer-readable storage medium storing acomputer program for controlling a computer to function as respectiveunits of an information processing apparatus of claim 1.